Abstract-Driven Pattern Discovery in Databases
نویسندگان
چکیده
Driven Pattern Discovery In Databases Vasant Dhar and Alexander Tuzhilin Department of Information, Operations and Management Sciences Leonard N. Stem School of Business, New York University 44 West 4th Street, New York, NY 10012 vdhar@,stern.nvu.edu, atuzhilin@,stern.nvu.edu Center for Digltzl Economy Rrsaarch Stern School of Bushess W o r h g Paper IS-93-1 1 Abstract-Driven Pattern Discovery in DatabasesDriven Pattern Discovery in Databases Vasant Dhar Alexander Tuzhilin Information Systems Department New York University Stern School of Business 40 IVest 4th Street, Room 624 New York, NY 10003 A b s t r a c t In this paper, we study the problem of dlscovering interestlng patterns in large volumes of data . Patterns can be expressed not only In terms of the database schema bu t also In user-defined terms, such as relational vlews and classification hierarchies The user-defined terminology 1s stored in a data dtctzonary tha t maps i t Into the language of the database schema. We define a pattern as a deductive rule expressed In user-defined terms that has a degree of certaint? associated with i t . We present methods of dlscovering interestlng patterns based on abs t~ac t s whtch are summaries of the data expressed in the language of the user K e y w o r d s : pattern discovery, da ta abstraction, classification, generalization Workinq Paper Series STERN IS-93-11 Center for Digital Econotny Research Stern School of Business Working Paper IS-9311 dataija5c. at t r ibute values are replhced by the set t o which they heiong. Han e t . d (HCC92 u s a similar technique t o search for dependencies among the abstracted a t t r ibute values and also incorporate a probabiljty measure Into the dependency. Our approach generalizes on IValzlher's a n d Nan et.al's in t ha t at t r ibute values In an abstracted database can $50 be predicates or vle1i.s of t he original database, depending on multiple attributes. We also allow a variety of functions. such z.s summation. averaging, etc.. t o he used in addition t o countir~g for aggregating at t r ibute value5 Other difierences \still be described after presenting our model in Section 5. In order t o describe pat tern drsco~ery, we first need a prpcjse definition of a pat tern. Certainly. there is no standard definit~on of the term in the literature. In trying t o draw a common thread through a recent collection of papers on "Knowledge Discovery in Databases," Frawley e t . d . [FPSMSI] define pat terns as follows: G ~ v e n a set of facts (da t a ) F, a language L. and some mr-a-qurp oi certainty C, a pattern 5 1s a statement S in L tha t describes relationships among a subset Fs of F with certajnt? C , such tha t S is simpler (In some sense) than the e n u m e ~ a t ~ o n of all facts in Fs. This definition is intentionally vague t o cover a wide variety of approaches. For example. even a set of statistical parameters such as the mean and standard deviation for a collection of numerical values qualifies as a pat tern with the above definition. In fact. any abstraction tha t in some sense summarizes the d a t a would satisfy the above definition of a pattern. In contrast t o this. Ive define a pat tern in a more restricted sense. as a rule t ha t has xsociated with it a degree of certainty. The preclse form of the rule will be described in Section 3.
منابع مشابه
Handling large databases in data mining
M. Mehdi Owrang O. American University, Dept of Computer Science & IS, Washington DC 20016 [email protected] ABSTRACT Current database technology involves processing a large volume of data in order to discover new knowledge. The high volume of data makes discovery process computationally expensive. In addition, real-world databases tend to be incomplete, redundant, and inconsistent that could...
متن کاملHandling Large Databases in Data Mining
M. Mehdi Owrang O. American University, Dept of Computer Science & IS, Washington DC 20016 [email protected] ABSTRACT Current database technology involves processing a large volume of data in order to discover new knowledge. The high volume of data makes discovery process computationally expensive. In addition, real-world databases tend to be incomplete, redundant, and inconsistent that could...
متن کاملDistributed Search and Pattern Matching
Peer-to-peer (P2P) technology has triggered a wide range of distributed applications including file-sharing, distributed XML databases, distributed computing, server-less web publishing and networked resource/service sharing. Despite of the diversity in application, these systems share common requirements for searching due to transitory nodes population and content volatility. In such dynamic e...
متن کاملActionable Rules: Issues and New Directions
Knowledge Discovery in Databases (KDD) is the process of extracting previously unknown, hidden and interesting patterns from a huge amount of data stored in databases. Data mining is a stage of the KDD process that aims at selecting and applying a particular data mining algorithm to extract an interesting and useful knowledge. It is highly expected that data mining methods will find interesting...
متن کاملPattern Discovery in Temporal Databases: A Temporal Logic Approach
1 This work was supported in part by the NSF under grant IRI-93-18773. Abstract The work of Mannila et al. [4] of finding frequent episodes in sequences is extended to finding temporal logic patterns in temporal databases. It is argued that temporal logic provides an appropriate formalism for expressing temporal patterns defined over categorical data. It is also proposed to use Temporal Logic P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Knowl. Data Eng.
دوره 5 شماره
صفحات -
تاریخ انتشار 1993